reinforcement learning policy
Reinforcement Learning Policy as Macro Regulator Rather than Macro Placer
In modern chip design, placement aims at placing millions of circuit modules, which is an essential step that significantly influences power, performance, and area (PPA) metrics. Recently, reinforcement learning (RL) has emerged as a promising technique for improving placement quality, especially macro placement. However, current RL-based placement methods suffer from long training times, low generalization ability, and inability to guarantee PPA results. A key issue lies in the problem formulation, i.e., using RL to place from scratch, which results in limits useful information and inaccurate rewards during the training process. In this work, we propose an approach that utilizes RL for the refinement stage, which allows the RL policy to learn how to adjust existing placement layouts, thereby receiving sufficient information for the policy to act and obtain relatively dense and precise rewards.
Safety-Oriented Pruning and Interpretation of Reinforcement Learning Policies
Pruning neural networks (NNs) can streamline them but risks removing vital parameters from safe reinforcement learning (RL) policies. We introduce an interpretable RL method called VERINTER, which combines NN pruning with model checking to ensure interpretable RL safety. VERINTER exactly quantifies the effects of pruning and the impact of neural connections on complex safety properties by analyzing changes in safety measurements. This method maintains safety in pruned RL policies and enhances understanding of their safety dynamics, which has proven effective in multiple RL settings.
- Research Report (0.64)
- Overview (0.47)
- Transportation > Passenger (0.49)
- Transportation > Ground > Road (0.48)
Fuzzy Ensembles of Reinforcement Learning Policies for Robotic Systems with Varied Parameters
Haddad, Abdel Gafoor, Mohiuddin, Mohammed B., Boiko, Igor, Zweiri, Yahya
Reinforcement Learning (RL) is an emerging approach to control many dynamical systems for which classical control approaches are not applicable or insufficient. However, the resultant policies may not generalize to variations in the parameters that the system may exhibit. This paper presents a powerful yet simple algorithm in which collaboration is facilitated between RL agents that are trained independently to perform the same task but with different system parameters. The independency among agents allows the exploitation of multi-core processing to perform parallel training. Two examples are provided to demonstrate the effectiveness of the proposed technique. The main demonstration is performed on a quadrotor with slung load tracking problem in a real-time experimental setup. It is shown that integrating the developed algorithm outperforms individual policies by reducing the RMSE tracking error. The robustness of the ensemble is also verified against wind disturbance.
On Generating Explanations for Reinforcement Learning Policies: An Empirical Study
Yuasa, Mikihisa, Tran, Huy T., Sreenivas, Ramavarapu S.
In this paper, we introduce a set of \textit{Linear Temporal Logic} (LTL) formulae designed to provide explanations for policies. Our focus is on crafting explanations that elucidate both the ultimate objectives accomplished by the policy and the prerequisites it upholds throughout its execution. These LTL-based explanations feature a structured representation, which is particularly well-suited for local-search techniques. The effectiveness of our proposed approach is illustrated through a simulated capture the flag environment. The paper concludes with suggested directions for future research.
Low-Rank Representation of Reinforcement Learning Policies
Mazoure, Bogdan (a:1:{s:5:"en_US";s:17:"McGill University";}) | Doan, Thang (McGill University) | Li, Tianyu (McGill University) | Makarenkov, Vladimir (UQÀM University) | Pineau, Joelle (McGill University) | Precup, Doina (Facebook AI Research) | Rabusseau, Guillaume (CIFAR AI Chair)
We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns.
- North America > Canada > Quebec > Montreal (0.14)
- Europe > Germany > Lower Saxony > Gottingen (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Explaining Reinforcement Learning Policies through Counterfactual Trajectories
Frost, Julius, Watkins, Olivia, Weiner, Eric, Abbeel, Pieter, Darrell, Trevor, Plummer, Bryan, Saenko, Kate
In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time. Some policy interpretability methods facilitate this by capturing the policy's decision making in a set of agent rollouts. However, even the most informative trajectories of training time behavior may give little insight into the agent's behavior out of distribution. In contrast, our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution. We generate these trajectories by guiding the agent to more diverse unseen states and showing the agent's behavior there. In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.